Skip to contents

Introduction

This document introduces and documents the R code that was developed for the purposes of building a generalised “severity index” for the UNHCR in Guatemala. It also serves as the source code for a package based on this code which is built using the fusen package.

The work here began with the objective of building a severity index at the municipal level for Guatemala. However over the course of the project, the scope of the work expanded to aim for a generalised index framework which can be applied to other countries as well, with Guatemala serving as a case study or demo. Therefore the result here is a set of generalised code “modules” which can be used to build severity indexes for any set of user input data. In fact, the end objective is to use these code modules as the basis for an app which lets non-R-users build their own severity indexes from a GUI interface, probably using Shiny.

The code here consists of five “modules” which represent the basic steps expected by the user in building their own severity index. To summarise, they are:

  1. Data input
  2. Indicator analysis and selection
  3. Index construction and visualisation
  4. Reweighting
  5. Export

Here, each module is summarised in a section of this documentation. The modules have a loose order of operation, with some subsequent modules requiring that the former are run first. For example, data input is mandatory and steps 2-5 cannot be run without it. However, indicator analysis and selection is optional - the following modules can still be run without it. Reweighting (4) cannot be run without index construction (3). Export can be enabled to run at any time, as long as data has been input.

Each module is organised as a collection of functions. Modules are expected to roughly correspond with “tabs” in an app, such that the user navigates from tab 1 (data input) through to tab 5 (export), following the order of operations. The dependence of one module on another can be managed in the app phase and according to the final specifications.

The code here rests heavily on the COINr package which is an R package for building and analysing composite indicators. The modules are built to act as a simplified interface between the user (via the app) and the COINr package. The idea is to guide the user to build their own severity index with a deliberately narrow range of options along the way. In this sense, the concept lies half way between a classical data exploration/visualisation of a single composite indicator (where the user can only explore an existing index) and a generalised GUI for building composite indicators (where a user might be able to build anything they want). This is a deliberate choice to enable an easy user interface.

As it stands, the code is loosely composed into an R package via the fusen package. Hence, all functions are defined here in the present R Markdown document. At the app building stage (if done via Shiny) it is expected to reorganise the code into a single Shiny-style R package.

For each module, a summary and general comments are given at the beginning of each section. Individual function description is given inside the function chunks themselves so that this is also viewable using ?function_name.

Although unit tests are generated by the fusen package, the functions here are not thoroughly unit tested themselves due to time/scope constraints of the present phase. However, many are thinnish wrappers for COINr functions which are well-covered by unit tests.

All the main functions are covered by examples, although for helper functions (called by other functions) I have omitted the examples.

DEMO DATA

The data set used to demonstrate the code package here is a set of 54 indicators covering 340 municipalities in Guatemala. This is found at ./inst/data_input/data_module-input.xlsx. The input here is deliberately kept as an .xlsx file because that this the expected input from users. Here, we simply call up the file path of the example data. In the following section, the data will be read in.

# Make your dataset file available to the current Rmd
pkgload::load_all(path = here::here(), export_all = FALSE)
#>  Loading BuildIndex

# You will be able to read your example data file in each of your function examples and tests as follows
example <- system.file("data_input/data_module-input.xlsx", package = "BuildIndex") 

Note that the example data here does not represent a finished product in terms of the Guatemalan index: although the data set already went through a number of iterations, further adjustments may be made in terms of the selected indicators, index structure and possibly also the methodological choices. However, this represents a working data set for the purposes of this documentation.

MODULE 1: DATA INPUT

Objective: To allow the user to input their data, which can then be used for the rest of the analysis.

Input(s): File path pointing to an Excel spreadsheet with data in the format specified by the template.

Output(s):

  • Front end: confirmation of successful data entry, or else helpful error messages. Summary of what was input, e.g. number of indicators, number of units. An interactive framework plot and/or a table of data.
  • Back end: An assembled coin.

This module consists of three functions:

  • f_data_input() which reads the Excel spreadsheet and imports the data into R, and outputs a coin.
  • f_print_coin() which prints a summary of the input data, to be output to the user.
  • f_plot_framework() which outputs an interactive plot of the indicator framework.

The app is expected to be set up so that, on running f_data_input(), f_print_coin() and f_plot_framework() are automatically run to immediately show the user what they uploaded. The functions are documented in the following subsections (inside the roxygen2 metadata).

As mentioned previously, the code here is deliberately set up to only give limited options to the user. In the input file, the user has the option to define:

  • Any number of indicators to include
  • The indicator values for any number of units (municipalities or similar)
  • Which “category” each indicator belongs to
  • A short code and longer name for each indicator
  • The directionality of each indicator (positive or negative)
  • The initial weight to be assigned to each indicator
  • Names for each municipality

The codes for each municipality must be the standardised “Admin 2” codes in order to be recognised in the mapping stage. In the app phase this check should be built in to the data input function.

Note that although the user can input their own data, the structure of the index is fixed and is defined by a data frame stored at inst/data_input/iMeta_aggs.RDS. This is “hard coded” into the data input function. Therefore, changing the structure, if needed, would amount to changing the stored data frame and altering the input template. Users can however decide which “categories” each indicator belongs to.

The input spreadsheet is still a work in progress and could be further optimised in the app phase. This would likely also entail adjusting f_data_input().

Data input

In the following, the example data is read in to create a coin called “MVI”.

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

The following example shows the output of this function:

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

f_print_coin(MVI)
#> ----------
#> Your data:
#> ----------
#> Input:
#>   Units: 340 (GT0101, GT0102, GT0103, ...)
#>   Indicators: 41 (S.A.1, S.A.3, S.A.4, ...)
#> 
#> Structure:
#>   Level 1 Indicator: 41 indicators (A.D.1, A.M.1, A.M.2, ...) 
#>   Level 2 Category: 12 groups (Desastres, Desplaz, Violencia, ...) 
#>   Level 3 Dimension: 3 groups (Amenazas, Cap_Resp, Sit_SocEc) 
#>   Level 4 Index: 1 groups (MVI)

Plot Framework

Run the following chunk to see the framework plot:

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

f_plot_framework(MVI)

MODULE 2: Indicator analysis and selection

Objective: To flag any statistical issues with indicators and allow the user to remove indicators if they want to (without having to edit their input file).

Input(s): This is a two-stage process: the analysis and the indicator selection. For the former there is no input. For the latter the input will be any indicators to remove. In the code this is a vector of indicator codes, but in app it will be selected interactively.

Output(s)

  • Front end: Analysis table at first step (as DT). Record of indicators removed.
  • Back end: Analysis table as data frame, then modified coin after removal of indicators, if any.

This module aims to run an automated statistical analysis of the indicators input by the user, and flag any issues. Flags are generated for any indicators that have:

  • Data availability below 66%
  • More than half of the observations sharing the same value
  • Possible outliers (absolute skew > 2 AND kurtosis > 3.5)
  • Collinearity (correlation > 0.9) with any indicators within the same category
  • Negative correlation (correlation < -0.4) with any indicators in the same category

The thresholds here are hard-coded into the function, since they are not expected to be accessible to the user, but can be adjusted by editing the source code.

The module consists of four main functions:

There are also several supporting functions. It is expected that the app will call f_analyse_indicators() and use f_display_indicator_analysis() to display the results. The user will then have the possibility to remove or add back any indicators (flagged or otherwise) e.g. by selecting rows on the table, which will call f_remove_indicators() and f_add_indicators() on the back end.

Analyse indicators

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw


MVI <- f_analyse_indicators(MVI)

Gather correlations

#f_gather_correlations()

Display indicator analysis

The following example generates the table:

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

MVI <- f_analyse_indicators(MVI)

f_display_indicator_analysis(MVI)

Highlight Data Table

#f_highlight_DT()

Remove indicators

The following example shows how indicators can be removed:

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

# call print method
MVI
#> --------------
#> A coin with...
#> --------------
#> Input:
#>   Units: 340 (GT0101, GT0102, GT0103, ...)
#>   Indicators: 41 (S.A.1, S.A.3, S.A.4, ...)
#>   Denominators: 0 ()
#>   Groups: 0 (none)
#> 
#> Structure:
#>   Level 1 Indicator: 41 indicators (A.D.1, A.M.1, A.M.2, ...) 
#>   Level 2 Category: 12 groups (Desastres, Desplaz, Violencia, ...) 
#>   Level 3 Dimension: 3 groups (Amenazas, Cap_Resp, Sit_SocEc) 
#>   Level 4 Index: 1 groups (MVI) 
#> 
#> Data sets:
#>   Raw (340 units)

MVI <- f_remove_indicators(MVI, c("S.G.3", "A.M.1"))
#> coin has been regenerated using new specs.
# compare with previous
MVI
#> --------------
#> A coin with...
#> --------------
#> Input:
#>   Units: 340 (GT0101, GT0102, GT0103, ...)
#>   Indicators: 39 (S.A.1, S.A.3, S.A.4, ...)
#>   Denominators: 0 ()
#>   Groups: 0 (none)
#> 
#> Structure:
#>   Level 1 Indicator: 39 indicators (A.D.1, A.M.2, A.M.4, ...) 
#>   Level 2 Category: 12 groups (Desastres, Desplaz, Violencia, ...) 
#>   Level 3 Dimension: 3 groups (Amenazas, Cap_Resp, Sit_SocEc) 
#>   Level 4 Index: 1 groups (MVI) 
#> 
#> Data sets:
#>   Raw (340 units)

Add indicators

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

# remove first
MVI <- f_remove_indicators(MVI, c("S.G.3", "A.M.1"))
#> coin has been regenerated using new specs.
# print method
MVI
#> --------------
#> A coin with...
#> --------------
#> Input:
#>   Units: 340 (GT0101, GT0102, GT0103, ...)
#>   Indicators: 39 (S.A.1, S.A.3, S.A.4, ...)
#>   Denominators: 0 ()
#>   Groups: 0 (none)
#> 
#> Structure:
#>   Level 1 Indicator: 39 indicators (A.D.1, A.M.2, A.M.4, ...) 
#>   Level 2 Category: 12 groups (Desastres, Desplaz, Violencia, ...) 
#>   Level 3 Dimension: 3 groups (Amenazas, Cap_Resp, Sit_SocEc) 
#>   Level 4 Index: 1 groups (MVI) 
#> 
#> Data sets:
#>   Raw (340 units)

# add one back
MVI <- f_add_indicators(MVI, "S.G.3")
#> coin has been regenerated using new specs.
MVI
#> --------------
#> A coin with...
#> --------------
#> Input:
#>   Units: 340 (GT0101, GT0102, GT0103, ...)
#>   Indicators: 40 (S.A.1, S.A.3, S.A.4, ...)
#>   Denominators: 0 ()
#>   Groups: 0 (none)
#> 
#> Structure:
#>   Level 1 Indicator: 40 indicators (A.D.1, A.M.2, A.M.4, ...) 
#>   Level 2 Category: 12 groups (Desastres, Desplaz, Violencia, ...) 
#>   Level 3 Dimension: 3 groups (Amenazas, Cap_Resp, Sit_SocEc) 
#>   Level 4 Index: 1 groups (MVI) 
#> 
#> Data sets:
#>   Raw (340 units)

MODULE 3: Index construction and visualisation

Objective: To build the index from the indicators selected in the previous step and show the results as table/map/bar chart.

Input(s): Possibly none from the user. If the methodology is fixed, there is no need for any input except perhaps which visualisation to use.

Output(s)

  • Front end: Results table, bar chart, map
  • Back end: Modified coin.

The index construction here uses the following steps:

  1. Treat any outliers (this can be optional)
  2. Normalise
  3. Aggregate, using one of four currently-enabled methods.

Overall, the module consists of three main functions, plus some other helpers:

Benefit of doubt aggregation

#

Wroclaw Taxonomic aggregation

#

Helper to get codes of any groups with only one child

#

Build index

The example simply builds the index and writes to the coin.

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

MVI1 <- f_build_index(coin = MVI, 
                     agg_method = "a_amean", #   (arithmetic mean),
                     max_winsorisation = 5,
                     skew_thresh = 2,
                     kurt_thresh = 3.5)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated

MVI2 <- f_build_index(coin = MVI, 
                     agg_method = "a_gmean", # (geometric mean),
                     max_winsorisation = 5,
                     skew_thresh = 2,
                     kurt_thresh = 3.5)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated

MVI3 <- f_build_index(coin = MVI, 
                     agg_method = "a_bod", # (benefit of doubt via Compind package) 
                     max_winsorisation = 5,
                     skew_thresh = 2,
                     kurt_thresh = 3.5)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated

MVI4 <- f_build_index(coin = MVI, 
                     agg_method = "a_wroclaw", # (Wroclaw Taxonomic Method via Compind)
                     max_winsorisation = 5,
                     skew_thresh = 2,
                     kurt_thresh = 3.5)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated

Display results table

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

MVI <- f_build_index(MVI)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated
f_display_results_table(MVI)

Plot map

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

shp_path <-  system.file("data_input/gtm_admbnda_adm2_ocha_conred_20190207.shp",
                                            package = "BuildIndex")


MVI <- f_build_index(MVI)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated

f_plot_map(coin = MVI, 
           dset = "Aggregated",
           iCode = "MVI", 
           shp_path = shp_path  )
## when using ut your shape in the folder --- # shp_path = here::here("inst/data-input", "gtm_admbnda_adm2_ocha_conred_20190207.shp") )

Plot map v2

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

MVI <- f_build_index(MVI)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated

shp_path <-  system.file("data_input/gtm_admbnda_adm2_ocha_conred_20190207.shp",
                                            package = "BuildIndex")

shp_name <- "ADM2_ES"
shp_ucode <- "ADM2_PCODE"

f_plot_map2(coin = MVI, 
           dset = "Aggregated",
           shp_path = shp_path,
           shp_name = shp_name,
           shp_ucode = shp_ucode )
## when using ut your shape in the folder --- # shp_path = here::here("inst/data-input", "gtm_admbnda_adm2_ocha_conred_20190207.shp") )

Generate results

#

MODULE 4: Reweighting

Objective: To allow users to adjust weights manually to their preferences, and see the results interactively change.

Input(s): Weights - which can be just at dimension level, or at dimension and category level. Would not recommend allowing indicator-level adjustment because it would result in a messy UI and probably confusion for the user.

Output(s)

  • Front: No specific outputs in this module. We will use the outputs of the previous module, i.e. the map and table outputs.
  • Back: Modified coin.

The idea here is to enable some interactive weight controls, such as sliders. The user adjusts the weights to their preference, and the results automatically update. This can probably be in the same “tab” as the results from the previous module, so on adjusting weights, the map or table will update.

As mentioned in discussions on this, allowing too much freedom with weight adjustment could lead to confusion or tuning weights to suit desired results. We could impose limits on weights, e.g. +/- 50%, which could help a bit here. This can be done by constraining the UI.

Change weights

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

MVI <- f_build_index(MVI)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated
MVI2 <- f_change_weights(MVI, w= list(Amenazas = 1.5, Cap_Resp = 0.5))

# we can use a COINr function for a comparison
COINr::compare_coins(MVI, MVI2, dset = "Aggregated", iCode = "MVI") |>
  head()
#>    uCode coin.1 coin.2 Diff Abs.diff
#> 1 GT0101    340    239  101      101
#> 2 GT0102    225    178   47       47
#> 3 GT0103    223    246  -23       23
#> 4 GT0104    275    315  -40       40
#> 5 GT0105    165    117   48       48
#> 6 GT0106    242    208   34       34

Get equal weights

#

Get last weights

#

MODULE 5: Export

Objective: To export all results to Excel.

Input(s): Just the command to export.

Output(s)

  • Front end: An Excel spreadsheet with results.
  • Back end: None

COINr has a function to export to Excel. However this outputs everything in the coin, which could be confusing to users and contains a lot of information that is probably not relevant. Instead, this module returns a simplified output which has the main results, a record of which indicators were selected, weights used, and the data sets generated at each construction stage for the record.

In more detail the output spreadsheet is as follows:

  • Results table (scores)
  • Results table (ranks)
  • Index structure
  • Analysis table (indicator analysis)
  • Weights used
  • Data sets generated at each stage

This module only consists of one function: f_export_to_excel().

Export to excel

MVI <- f_data_input(file_path = system.file("data_input/data_module-input.xlsx",
                                            package = "BuildIndex") )
#> Removed indicators with no data points: A.A.1, A.M.5, A.M.6, A.V.5, S.T.3, C.C.2, C.C.3, C.C.4, C.J.1, C.J.3, C.S.1, C.S.2, C.S.3
#> Removed categories containing no indicators: Acc_Hum
#> iData checked and OK.
#> iMeta checked and OK.
#> Written data set to .$Data$Raw

## when using create a data-raw folder and put you data input xlsx file there
# MVI <- f_data_input(here::here("data-raw", "data_module-input.xlsx"))
MVI <- f_analyse_indicators(MVI)
MVI <- f_build_index(MVI)
#> Written data set to .$Data$Treated
#> Written data set to .$Data$Normalised
#> Written data set to .$Data$Aggregated

f_export_to_excel(coin = MVI, 
                  fname = here::here("inst", "index_export.xlsx"))

Generate technical report

# f_export_report(  datafolder = "data-raw", ## This is the default folder where to put you data in
#                   data = "data_module-input.xlsx", ## Name of the data file
#                  shp = "gtm_admbnda_adm2_ocha_conred_20190207.shp", ## name of the shapefile to create the map
#                  folder = "Report")

Generate technical prez

# f_export_prez(  datafolder = "data-raw", ## This is the default folder where to put you data in
#                   data = "data_module-input.xlsx", ## Name of the data file
#                  shp = "gtm_admbnda_adm2_ocha_conred_20190207.shp", ## name of the shapefile to create the map
#                  folder = "Report")